Overview

Dataset statistics

Number of variables20
Number of observations5630
Missing cells1856
Missing cells (%)1.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory879.8 KiB
Average record size in memory160.0 B

Variable types

Numeric11
Categorical9

Warnings

Tenure has 264 (4.7%) missing values Missing
WarehouseToHome has 251 (4.5%) missing values Missing
HourSpendOnApp has 255 (4.5%) missing values Missing
OrderAmountHikeFromlastYear has 265 (4.7%) missing values Missing
CouponUsed has 256 (4.5%) missing values Missing
OrderCount has 258 (4.6%) missing values Missing
DaySinceLastOrder has 307 (5.5%) missing values Missing
CustomerID is uniformly distributed Uniform
CustomerID has unique values Unique
Tenure has 508 (9.0%) zeros Zeros
CouponUsed has 1030 (18.3%) zeros Zeros
DaySinceLastOrder has 496 (8.8%) zeros Zeros

Reproduction

Analysis started2021-02-22 03:44:25.615083
Analysis finished2021-02-22 03:44:49.826625
Duration24.21 seconds
Software versionpandas-profiling v2.10.1
Download configurationconfig.yaml

Variables

CustomerID
Real number (ℝ≥0)

UNIFORM
UNIQUE

Distinct5630
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean52815.5
Minimum50001
Maximum55630
Zeros0
Zeros (%)0.0%
Memory size44.1 KiB

Quantile statistics

Minimum50001
5-th percentile50282.45
Q151408.25
median52815.5
Q354222.75
95-th percentile55348.55
Maximum55630
Range5629
Interquartile range (IQR)2814.5

Descriptive statistics

Standard deviation1625.385339
Coefficient of variation (CV)0.03077477897
Kurtosis-1.2
Mean52815.5
Median Absolute Deviation (MAD)1407.5
Skewness0
Sum297351265
Variance2641877.5
MonotocityStrictly increasing
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
511991
 
< 0.1%
540271
 
< 0.1%
540111
 
< 0.1%
519641
 
< 0.1%
540151
 
< 0.1%
519681
 
< 0.1%
540191
 
< 0.1%
519721
 
< 0.1%
540231
 
< 0.1%
519761
 
< 0.1%
Other values (5620)5620
99.8%
ValueCountFrequency (%)
500011
< 0.1%
500021
< 0.1%
500031
< 0.1%
500041
< 0.1%
500051
< 0.1%
ValueCountFrequency (%)
556301
< 0.1%
556291
< 0.1%
556281
< 0.1%
556271
< 0.1%
556261
< 0.1%

Churn
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size44.1 KiB
0
4682 
1
948 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters5630
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1
ValueCountFrequency (%)
04682
83.2%
1948
 
16.8%
Histogram of lengths of the category
ValueCountFrequency (%)
04682
83.2%
1948
 
16.8%

Most occurring characters

ValueCountFrequency (%)
04682
83.2%
1948
 
16.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number5630
100.0%

Most frequent character per category

ValueCountFrequency (%)
04682
83.2%
1948
 
16.8%

Most occurring scripts

ValueCountFrequency (%)
Common5630
100.0%

Most frequent character per script

ValueCountFrequency (%)
04682
83.2%
1948
 
16.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII5630
100.0%

Most frequent character per block

ValueCountFrequency (%)
04682
83.2%
1948
 
16.8%

Tenure
Real number (ℝ≥0)

MISSING
ZEROS

Distinct36
Distinct (%)0.7%
Missing264
Missing (%)4.7%
Infinite0
Infinite (%)0.0%
Mean10.18989937
Minimum0
Maximum61
Zeros508
Zeros (%)9.0%
Memory size44.1 KiB

Quantile statistics

Minimum0
5-th percentile0
Q12
median9
Q316
95-th percentile27
Maximum61
Range61
Interquartile range (IQR)14

Descriptive statistics

Standard deviation8.557240984
Coefficient of variation (CV)0.8397767904
Kurtosis-0.007369469517
Mean10.18989937
Median Absolute Deviation (MAD)7
Skewness0.7365133839
Sum54679
Variance73.22637326
MonotocityNot monotonic
Histogram with fixed size bins (bins=36)
ValueCountFrequency (%)
1690
 
12.3%
0508
 
9.0%
8263
 
4.7%
9247
 
4.4%
7221
 
3.9%
10213
 
3.8%
5204
 
3.6%
4203
 
3.6%
3195
 
3.5%
11194
 
3.4%
Other values (26)2428
43.1%
(Missing)264
 
4.7%
ValueCountFrequency (%)
0508
9.0%
1690
12.3%
2167
 
3.0%
3195
 
3.5%
4203
 
3.6%
ValueCountFrequency (%)
611
 
< 0.1%
601
 
< 0.1%
511
 
< 0.1%
501
 
< 0.1%
3149
0.9%
Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size44.1 KiB
Mobile Phone
2765 
Computer
1634 
Phone
1231 

Length

Max length12
Median length8
Mean length9.308525755
Min length5

Characters and Unicode

Total characters52407
Distinct characters16
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMobile Phone
2nd rowPhone
3rd rowPhone
4th rowPhone
5th rowPhone
ValueCountFrequency (%)
Mobile Phone2765
49.1%
Computer1634
29.0%
Phone1231
21.9%
Histogram of lengths of the category
ValueCountFrequency (%)
phone3996
47.6%
mobile2765
32.9%
computer1634
19.5%

Most occurring characters

ValueCountFrequency (%)
o8395
16.0%
e8395
16.0%
P3996
7.6%
h3996
7.6%
n3996
7.6%
M2765
 
5.3%
b2765
 
5.3%
i2765
 
5.3%
l2765
 
5.3%
2765
 
5.3%
Other values (6)9804
18.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter41247
78.7%
Uppercase Letter8395
 
16.0%
Space Separator2765
 
5.3%

Most frequent character per category

ValueCountFrequency (%)
o8395
20.4%
e8395
20.4%
h3996
9.7%
n3996
9.7%
b2765
 
6.7%
i2765
 
6.7%
l2765
 
6.7%
m1634
 
4.0%
p1634
 
4.0%
u1634
 
4.0%
Other values (2)3268
 
7.9%
ValueCountFrequency (%)
P3996
47.6%
M2765
32.9%
C1634
19.5%
ValueCountFrequency (%)
2765
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin49642
94.7%
Common2765
 
5.3%

Most frequent character per script

ValueCountFrequency (%)
o8395
16.9%
e8395
16.9%
P3996
8.0%
h3996
8.0%
n3996
8.0%
M2765
 
5.6%
b2765
 
5.6%
i2765
 
5.6%
l2765
 
5.6%
C1634
 
3.3%
Other values (5)8170
16.5%
ValueCountFrequency (%)
2765
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII52407
100.0%

Most frequent character per block

ValueCountFrequency (%)
o8395
16.0%
e8395
16.0%
P3996
7.6%
h3996
7.6%
n3996
7.6%
M2765
 
5.3%
b2765
 
5.3%
i2765
 
5.3%
l2765
 
5.3%
2765
 
5.3%
Other values (6)9804
18.7%

CityTier
Categorical

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size44.1 KiB
1
3666 
3
1722 
2
 
242

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters5630
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3
2nd row1
3rd row1
4th row3
5th row1
ValueCountFrequency (%)
13666
65.1%
31722
30.6%
2242
 
4.3%
Histogram of lengths of the category
ValueCountFrequency (%)
13666
65.1%
31722
30.6%
2242
 
4.3%

Most occurring characters

ValueCountFrequency (%)
13666
65.1%
31722
30.6%
2242
 
4.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number5630
100.0%

Most frequent character per category

ValueCountFrequency (%)
13666
65.1%
31722
30.6%
2242
 
4.3%

Most occurring scripts

ValueCountFrequency (%)
Common5630
100.0%

Most frequent character per script

ValueCountFrequency (%)
13666
65.1%
31722
30.6%
2242
 
4.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII5630
100.0%

Most frequent character per block

ValueCountFrequency (%)
13666
65.1%
31722
30.6%
2242
 
4.3%

WarehouseToHome
Real number (ℝ≥0)

MISSING

Distinct34
Distinct (%)0.6%
Missing251
Missing (%)4.5%
Infinite0
Infinite (%)0.0%
Mean15.63989589
Minimum5
Maximum127
Zeros0
Zeros (%)0.0%
Memory size44.1 KiB

Quantile statistics

Minimum5
5-th percentile6
Q19
median14
Q320
95-th percentile33
Maximum127
Range122
Interquartile range (IQR)11

Descriptive statistics

Standard deviation8.531475187
Coefficient of variation (CV)0.545494372
Kurtosis9.986930421
Mean15.63989589
Median Absolute Deviation (MAD)5
Skewness1.619153668
Sum84127
Variance72.78606886
MonotocityNot monotonic
Histogram with fixed size bins (bins=34)
ValueCountFrequency (%)
9559
 
9.9%
8444
 
7.9%
7389
 
6.9%
16322
 
5.7%
14299
 
5.3%
6295
 
5.2%
15288
 
5.1%
10274
 
4.9%
13249
 
4.4%
11233
 
4.1%
Other values (24)2027
36.0%
(Missing)251
 
4.5%
ValueCountFrequency (%)
58
 
0.1%
6295
5.2%
7389
6.9%
8444
7.9%
9559
9.9%
ValueCountFrequency (%)
1271
 
< 0.1%
1261
 
< 0.1%
3651
0.9%
3593
1.7%
3463
1.1%
Distinct7
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size44.1 KiB
Debit Card
2314 
Credit Card
1501 
E wallet
614 
UPI
414 
COD
365 
Other values (2)
422 

Length

Max length16
Median length10
Mean length8.85079929
Min length2

Characters and Unicode

Total characters49830
Distinct characters23
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowDebit Card
2nd rowUPI
3rd rowDebit Card
4th rowDebit Card
5th rowCC
ValueCountFrequency (%)
Debit Card2314
41.1%
Credit Card1501
26.7%
E wallet614
 
10.9%
UPI414
 
7.4%
COD365
 
6.5%
CC273
 
4.8%
Cash on Delivery149
 
2.6%
Histogram of lengths of the category
ValueCountFrequency (%)
card3815
36.8%
debit2314
22.3%
credit1501
 
14.5%
e614
 
5.9%
wallet614
 
5.9%
upi414
 
4.0%
cod365
 
3.5%
cc273
 
2.6%
cash149
 
1.4%
delivery149
 
1.4%

Most occurring characters

ValueCountFrequency (%)
C6376
12.8%
r5465
11.0%
d5316
10.7%
e4727
9.5%
4727
9.5%
a4578
9.2%
t4429
8.9%
i3964
8.0%
D2828
5.7%
b2314
 
4.6%
Other values (13)5106
10.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter33678
67.6%
Uppercase Letter11425
 
22.9%
Space Separator4727
 
9.5%

Most frequent character per category

ValueCountFrequency (%)
r5465
16.2%
d5316
15.8%
e4727
14.0%
a4578
13.6%
t4429
13.2%
i3964
11.8%
b2314
6.9%
l1377
 
4.1%
w614
 
1.8%
s149
 
0.4%
Other values (5)745
 
2.2%
ValueCountFrequency (%)
C6376
55.8%
D2828
24.8%
E614
 
5.4%
U414
 
3.6%
P414
 
3.6%
I414
 
3.6%
O365
 
3.2%
ValueCountFrequency (%)
4727
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin45103
90.5%
Common4727
 
9.5%

Most frequent character per script

ValueCountFrequency (%)
C6376
14.1%
r5465
12.1%
d5316
11.8%
e4727
10.5%
a4578
10.2%
t4429
9.8%
i3964
8.8%
D2828
6.3%
b2314
 
5.1%
l1377
 
3.1%
Other values (12)3729
8.3%
ValueCountFrequency (%)
4727
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII49830
100.0%

Most frequent character per block

ValueCountFrequency (%)
C6376
12.8%
r5465
11.0%
d5316
10.7%
e4727
9.5%
4727
9.5%
a4578
9.2%
t4429
8.9%
i3964
8.0%
D2828
5.7%
b2314
 
4.6%
Other values (13)5106
10.2%

Gender
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size44.1 KiB
Male
3384 
Female
2246 

Length

Max length6
Median length4
Mean length4.797868561
Min length4

Characters and Unicode

Total characters27012
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFemale
2nd rowMale
3rd rowMale
4th rowMale
5th rowMale
ValueCountFrequency (%)
Male3384
60.1%
Female2246
39.9%
Histogram of lengths of the category
ValueCountFrequency (%)
male3384
60.1%
female2246
39.9%

Most occurring characters

ValueCountFrequency (%)
e7876
29.2%
a5630
20.8%
l5630
20.8%
M3384
12.5%
F2246
 
8.3%
m2246
 
8.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter21382
79.2%
Uppercase Letter5630
 
20.8%

Most frequent character per category

ValueCountFrequency (%)
e7876
36.8%
a5630
26.3%
l5630
26.3%
m2246
 
10.5%
ValueCountFrequency (%)
M3384
60.1%
F2246
39.9%

Most occurring scripts

ValueCountFrequency (%)
Latin27012
100.0%

Most frequent character per script

ValueCountFrequency (%)
e7876
29.2%
a5630
20.8%
l5630
20.8%
M3384
12.5%
F2246
 
8.3%
m2246
 
8.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII27012
100.0%

Most frequent character per block

ValueCountFrequency (%)
e7876
29.2%
a5630
20.8%
l5630
20.8%
M3384
12.5%
F2246
 
8.3%
m2246
 
8.3%

HourSpendOnApp
Real number (ℝ≥0)

MISSING

Distinct6
Distinct (%)0.1%
Missing255
Missing (%)4.5%
Infinite0
Infinite (%)0.0%
Mean2.931534884
Minimum0
Maximum5
Zeros3
Zeros (%)0.1%
Memory size44.1 KiB

Quantile statistics

Minimum0
5-th percentile2
Q12
median3
Q33
95-th percentile4
Maximum5
Range5
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.72192585
Coefficient of variation (CV)0.2462620704
Kurtosis-0.667076137
Mean2.931534884
Median Absolute Deviation (MAD)1
Skewness-0.02721262163
Sum15757
Variance0.5211769329
MonotocityNot monotonic
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
32687
47.7%
21471
26.1%
41176
20.9%
135
 
0.6%
53
 
0.1%
03
 
0.1%
(Missing)255
 
4.5%
ValueCountFrequency (%)
03
 
0.1%
135
 
0.6%
21471
26.1%
32687
47.7%
41176
20.9%
ValueCountFrequency (%)
53
 
0.1%
41176
20.9%
32687
47.7%
21471
26.1%
135
 
0.6%

NumberOfDeviceRegistered
Real number (ℝ≥0)

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.688987567
Minimum1
Maximum6
Zeros0
Zeros (%)0.0%
Memory size44.1 KiB

Quantile statistics

Minimum1
5-th percentile2
Q13
median4
Q34
95-th percentile5
Maximum6
Range5
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.023998519
Coefficient of variation (CV)0.2775825346
Kurtosis0.5828487316
Mean3.688987567
Median Absolute Deviation (MAD)1
Skewness-0.3969686435
Sum20769
Variance1.048572967
MonotocityNot monotonic
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
42377
42.2%
31699
30.2%
5881
 
15.6%
2276
 
4.9%
1235
 
4.2%
6162
 
2.9%
ValueCountFrequency (%)
1235
 
4.2%
2276
 
4.9%
31699
30.2%
42377
42.2%
5881
 
15.6%
ValueCountFrequency (%)
6162
 
2.9%
5881
 
15.6%
42377
42.2%
31699
30.2%
2276
 
4.9%

PreferedOrderCat
Categorical

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size44.1 KiB
Laptop & Accessory
2050 
Mobile Phone
1271 
Fashion
826 
Mobile
809 
Grocery
410 

Length

Max length18
Median length12
Mean length11.94351687
Min length6

Characters and Unicode

Total characters67242
Distinct characters23
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowLaptop & Accessory
2nd rowMobile
3rd rowMobile
4th rowLaptop & Accessory
5th rowMobile
ValueCountFrequency (%)
Laptop & Accessory2050
36.4%
Mobile Phone1271
22.6%
Fashion826
14.7%
Mobile809
 
14.4%
Grocery410
 
7.3%
Others264
 
4.7%
Histogram of lengths of the category
ValueCountFrequency (%)
mobile2080
18.9%
accessory2050
18.6%
laptop2050
18.6%
2050
18.6%
phone1271
11.6%
fashion826
 
7.5%
grocery410
 
3.7%
others264
 
2.4%

Most occurring characters

ValueCountFrequency (%)
o8687
 
12.9%
e6075
 
9.0%
5371
 
8.0%
s5190
 
7.7%
c4510
 
6.7%
p4100
 
6.1%
r3134
 
4.7%
i2906
 
4.3%
a2876
 
4.3%
y2460
 
3.7%
Other values (13)21933
32.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter50870
75.7%
Uppercase Letter8951
 
13.3%
Space Separator5371
 
8.0%
Other Punctuation2050
 
3.0%

Most frequent character per category

ValueCountFrequency (%)
o8687
17.1%
e6075
11.9%
s5190
10.2%
c4510
8.9%
p4100
8.1%
r3134
 
6.2%
i2906
 
5.7%
a2876
 
5.7%
y2460
 
4.8%
h2361
 
4.6%
Other values (4)8571
16.8%
ValueCountFrequency (%)
M2080
23.2%
L2050
22.9%
A2050
22.9%
P1271
14.2%
F826
 
9.2%
G410
 
4.6%
O264
 
2.9%
ValueCountFrequency (%)
5371
100.0%
ValueCountFrequency (%)
&2050
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin59821
89.0%
Common7421
 
11.0%

Most frequent character per script

ValueCountFrequency (%)
o8687
14.5%
e6075
 
10.2%
s5190
 
8.7%
c4510
 
7.5%
p4100
 
6.9%
r3134
 
5.2%
i2906
 
4.9%
a2876
 
4.8%
y2460
 
4.1%
h2361
 
3.9%
Other values (11)17522
29.3%
ValueCountFrequency (%)
5371
72.4%
&2050
 
27.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII67242
100.0%

Most frequent character per block

ValueCountFrequency (%)
o8687
 
12.9%
e6075
 
9.0%
5371
 
8.0%
s5190
 
7.7%
c4510
 
6.7%
p4100
 
6.1%
r3134
 
4.7%
i2906
 
4.3%
a2876
 
4.3%
y2460
 
3.7%
Other values (13)21933
32.6%
Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size44.1 KiB
3
1698 
1
1164 
5
1108 
4
1074 
2
586 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters5630
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row3
3rd row3
4th row5
5th row5
ValueCountFrequency (%)
31698
30.2%
11164
20.7%
51108
19.7%
41074
19.1%
2586
 
10.4%
Histogram of lengths of the category
ValueCountFrequency (%)
31698
30.2%
11164
20.7%
51108
19.7%
41074
19.1%
2586
 
10.4%

Most occurring characters

ValueCountFrequency (%)
31698
30.2%
11164
20.7%
51108
19.7%
41074
19.1%
2586
 
10.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number5630
100.0%

Most frequent character per category

ValueCountFrequency (%)
31698
30.2%
11164
20.7%
51108
19.7%
41074
19.1%
2586
 
10.4%

Most occurring scripts

ValueCountFrequency (%)
Common5630
100.0%

Most frequent character per script

ValueCountFrequency (%)
31698
30.2%
11164
20.7%
51108
19.7%
41074
19.1%
2586
 
10.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII5630
100.0%

Most frequent character per block

ValueCountFrequency (%)
31698
30.2%
11164
20.7%
51108
19.7%
41074
19.1%
2586
 
10.4%

MaritalStatus
Categorical

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size44.1 KiB
Married
2986 
Single
1796 
Divorced
848 

Length

Max length8
Median length7
Mean length6.831616341
Min length6

Characters and Unicode

Total characters38462
Distinct characters14
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSingle
2nd rowSingle
3rd rowSingle
4th rowSingle
5th rowSingle
ValueCountFrequency (%)
Married2986
53.0%
Single1796
31.9%
Divorced848
 
15.1%
Histogram of lengths of the category
ValueCountFrequency (%)
married2986
53.0%
single1796
31.9%
divorced848
 
15.1%

Most occurring characters

ValueCountFrequency (%)
r6820
17.7%
i5630
14.6%
e5630
14.6%
d3834
10.0%
M2986
7.8%
a2986
7.8%
S1796
 
4.7%
n1796
 
4.7%
g1796
 
4.7%
l1796
 
4.7%
Other values (4)3392
8.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter32832
85.4%
Uppercase Letter5630
 
14.6%

Most frequent character per category

ValueCountFrequency (%)
r6820
20.8%
i5630
17.1%
e5630
17.1%
d3834
11.7%
a2986
9.1%
n1796
 
5.5%
g1796
 
5.5%
l1796
 
5.5%
v848
 
2.6%
o848
 
2.6%
ValueCountFrequency (%)
M2986
53.0%
S1796
31.9%
D848
 
15.1%

Most occurring scripts

ValueCountFrequency (%)
Latin38462
100.0%

Most frequent character per script

ValueCountFrequency (%)
r6820
17.7%
i5630
14.6%
e5630
14.6%
d3834
10.0%
M2986
7.8%
a2986
7.8%
S1796
 
4.7%
n1796
 
4.7%
g1796
 
4.7%
l1796
 
4.7%
Other values (4)3392
8.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII38462
100.0%

Most frequent character per block

ValueCountFrequency (%)
r6820
17.7%
i5630
14.6%
e5630
14.6%
d3834
10.0%
M2986
7.8%
a2986
7.8%
S1796
 
4.7%
n1796
 
4.7%
g1796
 
4.7%
l1796
 
4.7%
Other values (4)3392
8.8%

NumberOfAddress
Real number (ℝ≥0)

Distinct15
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.214031972
Minimum1
Maximum22
Zeros0
Zeros (%)0.0%
Memory size44.1 KiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q36
95-th percentile10
Maximum22
Range21
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.583585513
Coefficient of variation (CV)0.6130911037
Kurtosis0.9592292732
Mean4.214031972
Median Absolute Deviation (MAD)1
Skewness1.088639383
Sum23725
Variance6.674914101
MonotocityNot monotonic
Histogram with fixed size bins (bins=15)
ValueCountFrequency (%)
21369
24.3%
31278
22.7%
4588
10.4%
5571
10.1%
6382
 
6.8%
1371
 
6.6%
8280
 
5.0%
7256
 
4.5%
9239
 
4.2%
10194
 
3.4%
Other values (5)102
 
1.8%
ValueCountFrequency (%)
1371
 
6.6%
21369
24.3%
31278
22.7%
4588
10.4%
5571
10.1%
ValueCountFrequency (%)
221
 
< 0.1%
211
 
< 0.1%
201
 
< 0.1%
191
 
< 0.1%
1198
1.7%

Complain
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size44.1 KiB
0
4026 
1
1604 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters5630
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row0
5th row0
ValueCountFrequency (%)
04026
71.5%
11604
 
28.5%
Histogram of lengths of the category
ValueCountFrequency (%)
04026
71.5%
11604
 
28.5%

Most occurring characters

ValueCountFrequency (%)
04026
71.5%
11604
 
28.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number5630
100.0%

Most frequent character per category

ValueCountFrequency (%)
04026
71.5%
11604
 
28.5%

Most occurring scripts

ValueCountFrequency (%)
Common5630
100.0%

Most frequent character per script

ValueCountFrequency (%)
04026
71.5%
11604
 
28.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII5630
100.0%

Most frequent character per block

ValueCountFrequency (%)
04026
71.5%
11604
 
28.5%

OrderAmountHikeFromlastYear
Real number (ℝ≥0)

MISSING

Distinct16
Distinct (%)0.3%
Missing265
Missing (%)4.7%
Infinite0
Infinite (%)0.0%
Mean15.70792171
Minimum11
Maximum26
Zeros0
Zeros (%)0.0%
Memory size44.1 KiB

Quantile statistics

Minimum11
5-th percentile11
Q113
median15
Q318
95-th percentile23
Maximum26
Range15
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.675485463
Coefficient of variation (CV)0.2339892908
Kurtosis-0.2803811889
Mean15.70792171
Median Absolute Deviation (MAD)3
Skewness0.7907853591
Sum84273
Variance13.50919339
MonotocityNot monotonic
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%)
14750
13.3%
13741
13.2%
12728
12.9%
15542
9.6%
11391
6.9%
16333
5.9%
18321
5.7%
19311
5.5%
17297
 
5.3%
20243
 
4.3%
Other values (6)708
12.6%
(Missing)265
 
4.7%
ValueCountFrequency (%)
11391
6.9%
12728
12.9%
13741
13.2%
14750
13.3%
15542
9.6%
ValueCountFrequency (%)
2633
 
0.6%
2573
 
1.3%
2484
1.5%
23144
2.6%
22184
3.3%

CouponUsed
Real number (ℝ≥0)

MISSING
ZEROS

Distinct17
Distinct (%)0.3%
Missing256
Missing (%)4.5%
Infinite0
Infinite (%)0.0%
Mean1.751023446
Minimum0
Maximum16
Zeros1030
Zeros (%)18.3%
Memory size44.1 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median1
Q32
95-th percentile6
Maximum16
Range16
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.894621447
Coefficient of variation (CV)1.08200804
Kurtosis9.132281171
Mean1.751023446
Median Absolute Deviation (MAD)1
Skewness2.545652562
Sum9410
Variance3.589590428
MonotocityNot monotonic
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
12105
37.4%
21283
22.8%
01030
18.3%
3327
 
5.8%
4197
 
3.5%
5129
 
2.3%
6108
 
1.9%
789
 
1.6%
842
 
0.7%
1014
 
0.2%
Other values (7)50
 
0.9%
(Missing)256
 
4.5%
ValueCountFrequency (%)
01030
18.3%
12105
37.4%
21283
22.8%
3327
 
5.8%
4197
 
3.5%
ValueCountFrequency (%)
162
 
< 0.1%
151
 
< 0.1%
145
0.1%
138
0.1%
129
0.2%

OrderCount
Real number (ℝ≥0)

MISSING

Distinct16
Distinct (%)0.3%
Missing258
Missing (%)4.6%
Infinite0
Infinite (%)0.0%
Mean3.008004468
Minimum1
Maximum16
Zeros0
Zeros (%)0.0%
Memory size44.1 KiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median2
Q33
95-th percentile9
Maximum16
Range15
Interquartile range (IQR)2

Descriptive statistics

Standard deviation2.939679548
Coefficient of variation (CV)0.9772856323
Kurtosis4.718466052
Mean3.008004468
Median Absolute Deviation (MAD)1
Skewness2.196414108
Sum16159
Variance8.641715846
MonotocityNot monotonic
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%)
22025
36.0%
11751
31.1%
3371
 
6.6%
7206
 
3.7%
4204
 
3.6%
5181
 
3.2%
8172
 
3.1%
6137
 
2.4%
962
 
1.1%
1254
 
1.0%
Other values (6)209
 
3.7%
(Missing)258
 
4.6%
ValueCountFrequency (%)
11751
31.1%
22025
36.0%
3371
 
6.6%
4204
 
3.6%
5181
 
3.2%
ValueCountFrequency (%)
1623
0.4%
1533
0.6%
1436
0.6%
1330
0.5%
1254
1.0%

DaySinceLastOrder
Real number (ℝ≥0)

MISSING
ZEROS

Distinct22
Distinct (%)0.4%
Missing307
Missing (%)5.5%
Infinite0
Infinite (%)0.0%
Mean4.543490513
Minimum0
Maximum46
Zeros496
Zeros (%)8.8%
Memory size44.1 KiB

Quantile statistics

Minimum0
5-th percentile0
Q12
median3
Q37
95-th percentile11
Maximum46
Range46
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.654433197
Coefficient of variation (CV)0.8043228409
Kurtosis4.023964341
Mean4.543490513
Median Absolute Deviation (MAD)2
Skewness1.190999503
Sum24185
Variance13.35488199
MonotocityNot monotonic
Histogram with fixed size bins (bins=22)
ValueCountFrequency (%)
3900
16.0%
2792
14.1%
1614
10.9%
8538
9.6%
0496
8.8%
7447
7.9%
4431
7.7%
9299
 
5.3%
5228
 
4.0%
10157
 
2.8%
Other values (12)421
7.5%
(Missing)307
 
5.5%
ValueCountFrequency (%)
0496
8.8%
1614
10.9%
2792
14.1%
3900
16.0%
4431
7.7%
ValueCountFrequency (%)
461
 
< 0.1%
311
 
< 0.1%
301
 
< 0.1%
1810
0.2%
1717
0.3%

CashbackAmount
Real number (ℝ≥0)

Distinct220
Distinct (%)3.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean177.221492
Minimum0
Maximum325
Zeros4
Zeros (%)0.1%
Memory size44.1 KiB

Quantile statistics

Minimum0
5-th percentile123
Q1146
median163
Q3196
95-th percentile292
Maximum325
Range325
Interquartile range (IQR)50

Descriptive statistics

Standard deviation49.19386891
Coefficient of variation (CV)0.2775841031
Kurtosis0.9735461687
Mean177.221492
Median Absolute Deviation (MAD)23
Skewness1.149594996
Sum997757
Variance2420.036738
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
148134
 
2.4%
149120
 
2.1%
146118
 
2.1%
152116
 
2.1%
153108
 
1.9%
12397
 
1.7%
15195
 
1.7%
15489
 
1.6%
14787
 
1.5%
15084
 
1.5%
Other values (210)4582
81.4%
ValueCountFrequency (%)
04
0.1%
121
 
< 0.1%
254
0.1%
371
 
< 0.1%
561
 
< 0.1%
ValueCountFrequency (%)
3254
 
0.1%
3246
0.1%
3236
0.1%
32210
0.2%
32112
0.2%

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

CustomerIDChurnTenurePreferredLoginDeviceCityTierWarehouseToHomePreferredPaymentModeGenderHourSpendOnAppNumberOfDeviceRegisteredPreferedOrderCatSatisfactionScoreMaritalStatusNumberOfAddressComplainOrderAmountHikeFromlastYearCouponUsedOrderCountDaySinceLastOrderCashbackAmount
05000114.0Mobile Phone36.0Debit CardFemale3.03Laptop & Accessory2Single9111.01.01.05.0160
1500021NaNPhone18.0UPIMale3.04Mobile3Single7115.00.01.00.0121
2500031NaNPhone130.0Debit CardMale2.04Mobile3Single6114.00.01.03.0120
35000410.0Phone315.0Debit CardMale2.04Laptop & Accessory5Single8023.00.01.03.0134
45000510.0Phone112.0CCMaleNaN3Mobile5Single3011.01.01.03.0130
55000610.0Computer122.0Debit CardFemale3.05Mobile Phone5Single2122.04.06.07.0139
6500071NaNPhone311.0Cash on DeliveryMale2.03Laptop & Accessory2Divorced4014.00.01.00.0121
7500081NaNPhone16.0CCMale3.03Mobile2Divorced3116.02.02.00.0123
850009113.0Phone39.0E walletMaleNaN4Mobile3Divorced2114.00.01.02.0127
9500101NaNPhone131.0Debit CardMale2.05Mobile3Single2012.01.01.01.0123

Last rows

CustomerIDChurnTenurePreferredLoginDeviceCityTierWarehouseToHomePreferredPaymentModeGenderHourSpendOnAppNumberOfDeviceRegisteredPreferedOrderCatSatisfactionScoreMaritalStatusNumberOfAddressComplainOrderAmountHikeFromlastYearCouponUsedOrderCountDaySinceLastOrderCashbackAmount
56205562103.0Mobile Phone135.0Credit CardFemale4.05Mobile Phone5Single3015.01.02.05.0163
562155622114.0Mobile Phone335.0E walletMale3.05Fashion5Married6114.03.0NaN1.0234
562255623013.0Mobile Phone331.0E walletFemale3.05Grocery1Married2012.04.0NaN7.0245
56235562405.0Computer112.0Credit CardMale4.04Laptop & Accessory5Single2020.02.02.0NaN224
56245562501.0Mobile Phone312.0UPIFemale2.05Mobile Phone3Single2019.02.02.01.0155
562555626010.0Computer130.0Credit CardMale3.02Laptop & Accessory1Married6018.01.02.04.0151
562655627013.0Mobile Phone113.0Credit CardMale3.05Fashion5Married6016.01.02.0NaN225
56275562801.0Mobile Phone111.0Debit CardMale3.02Laptop & Accessory4Married3121.01.02.04.0186
562855629023.0Computer39.0Credit CardMale4.05Laptop & Accessory4Married4015.02.02.09.0179
56295563008.0Mobile Phone115.0Credit CardMale3.02Laptop & Accessory3Married4013.02.02.03.0169